SVM Based Prediction of Bacterial Transcription Start Sites
نویسندگان
چکیده
Identifying bacterial promoters is the key to understanding gene expression. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). Knowing the TSS position, one can predict promoter positions to within a few base pairs, and vice versa. As a route to promoter identification, we formally address the problem of TSS prediction, drawing on the RegulonDB database of known (mapped) Escherichia coli TSS locations. The accepted method of finding promoters (and therefore TSSs) is to use position weight matrices (PWMs). We use an alternative approach based on support vector machines (SVMs). In particular, we quantify performance of several SVM models versus a PWM approach, using area under the detection-error tradeoff (DET) curve as a performance metric. SVM models are shown to outperform the PWM at TSS prediction, and to substantially reduce numbers of false positives, which are the bane of this problem.
منابع مشابه
Improved prediction of bacterial transcription start sites
MOTIVATION Identifying bacterial promoters is an important step towards understanding gene regulation. In this paper, we address the problem of predicting the location of promoters and their transcription start sites (TSSs) in Escherichia coli. The accepted method for this problem is to use position weight matrices (PWMs), which define conserved motifs at the sigma-factor binding site. However ...
متن کاملThe Prediction of Bacterial Transcription Start Sites Using Svms
Identifying promoters is the key to understanding gene expression in bacteria. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). In this paper, we address the problem of predicting transcription start sites in Escherichia coli. Knowing the TSS position, one can then predict the promoter position to within a few base pairs, and vice versa. The accepte...
متن کاملPREDetector: a new tool to identify regulatory elements in bacterial genomes.
In the post-genomic area, the prediction of transcription factor regulons by position weight matrix-based programmes is a powerful approach to decipher biological pathways and to modelize regulatory networks in bacteria. The main difficulty once a regulon prediction is available is to estimate its reliability prior to start expensive experimental validations and therefore trying to find a way h...
متن کاملPrediction of Eukaryotic Translation Initiation Sites Using Machine Learning
The computational identification of translation initiation sites (TIS) is a major component of every gene prediction system, and is thus of major importance in genome annotation projects. A large number of machine learning methods have been described to identify TIS in transcripts such as mRNA, EST and cDNA sequences. In this regard, most of the prediction methods have focused on recognizing TI...
متن کاملTowards accurate transcription start site prediction: a modelling approach
Promoter prediction in bacteria is a classical bioinformatics problem, where available methods for regulatory element detection exhibit a very high number of false positives. We here argue that accurate transcription start site (TSS) prediction is a complex problem, where available methods for sequence motif discovery are not in itself well adopted for solving the problem. We here instead propo...
متن کامل